icomp 2024
UVIP: Model-Free Approach to Evaluate Reinforcement Learning Algorithms
Levin, Ilya, Belomestny, Denis, Naumov, Alexey, Samsonov, Sergey
Policy evaluation is an important instrument for the comparison of different algorithms in Reinforcement Learning (RL). Yet even a precise knowledge of the value function $V^{\pi}$ corresponding to a policy $\pi$ does not provide reliable information on how far is the policy $\pi$ from the optimal one. We present a novel model-free upper value iteration procedure $({\sf UVIP})$ that allows us to estimate the suboptimality gap $V^{\star}(x) - V^{\pi}(x)$ from above and to construct confidence intervals for $V^\star$. Our approach relies on upper bounds to the solution of the Bellman optimality equation via martingale approach. We provide theoretical guarantees for ${\sf UVIP}$ under general assumptions and illustrate its performance on a number of benchmark RL problems.
Tensor-Train Point Cloud Compression and Efficient Approximate Nearest-Neighbor Search
Novikov, Georgii, Gneushev, Alexander, Kadeishvili, Alexey, Oseledets, Ivan
Nearest-neighbor search in large vector databases is crucial for various machine learning applications. This paper introduces a novel method using tensor-train (TT) low-rank tensor decomposition to efficiently represent point clouds and enable fast approximate nearest-neighbor searches. We propose a probabilistic interpretation and utilize density estimation losses like Sliced Wasserstein to train TT decompositions, resulting in robust point cloud compression. We reveal an inherent hierarchical structure within TT point clouds, facilitating efficient approximate nearest-neighbor searches. In our paper, we provide detailed insights into the methodology and conduct comprehensive comparisons with existing methods. We demonstrate its effectiveness in various scenarios, including out-of-distribution (OOD) detection problems and approximate nearest-neighbor (ANN) search tasks.
Exploring Applications of State Space Models and Advanced Training Techniques in Sequential Recommendations: A Comparative Study on Efficiency and Performance
Obozov, Mark, Baderko, Makar, Kulibaba, Stepan, Kutuzov, Nikolay, Gasnikov, Alexander
Recommender systems aim to estimate the dynamically changing user preferences and sequential dependencies between historical user behaviour and metadata. Although transformer-based models have proven to be effective in sequential recommendations, their state growth is proportional to the length of the sequence that is being processed, which makes them expensive in terms of memory and inference costs. Our research focused on three promising directions in sequential recommendations: enhancing speed through the use of State Space Models (SSM), as they can achieve SOTA results in the sequential recommendations domain with lower latency, memory, and inference costs, as proposed by Liu et al. (2024); improving the quality of recommendations with Large Language Models (LLMs) via Monolithic Preference Optimization without Reference Model (ORPO); and implementing adaptive batch-and step-size algorithms to reduce costs and accelerate training processes. Recently, transformer models have been shown to be effective in sequential recommendation tasks as the backbone of larger models Kang & McAuley (2018) and as individual LLMs Li et al. (2023), Yue et al. (2023). Despite their success, attention-based methods face inference inefficiencies due to the quadratic computational complexity inherent in attention operators and their rapid state growth, which is proportional to the sequence length.